;HNICAL REPORT SECTION 
V.’AL K 7 r * PifJ/5 7£ SCHOOL 



NPS55 Z 78 = Q33_ 

NAVAL POSTGRADUATE SCHOOL 

Monterey, California 




SIMPLE MODELS FOR POSITIVE-VALUED 
AND DISCRETE- VALUED TIME SERIES 
WITH ARMA CORRELATION STRUCTURE 

S by 

P. A. W. Lewis 

November 1978 

Approved for public release; distribution unlimited. 



FEDDOCS 
D 208.14/2: 
NPS-55-78-033 




Rear Admiral T. 
Superintendent 

Reproduction of 
This report was 



NAVAL POSTGRADUATE SCHOOL 
Monterey , California 



F. Dedman 



J. R. Borsting 
Provost 



all or part of this report is authorized, 
prepared by: 



r> 



UNCLASSIFIED 



SECURITY CLASSIFICATION OF THIS PAGE (When Data Entered) 



REPORT DOCUMENTATION PAGE 


READ INSTRUCTIONS 
BEFORE COMPLETING FORM 


1. REPORT NUMBER 

NPS55-78-033 


2 GOVT ACCESSION NO. 


3. RECIPIENT'S CATALOG NUMBER 


4. TITLE (and Subtitle) 

Simple Models for Positive-Valued and Discrete- 
Valued Time Series with ARMA Correlation 
Structure 


5. TYPE OF REPORT A PERIOD COVERED 

Technical 


6. PERFORMING ORG. REPORT NUMBER 


7. AUTHORfa; 

P. A. W. Lewis 


B. CONTRACT OR GRANT NUMBER^#; 


9 PERFORMING ORGANIZATION NAME AND ADORESS 

Naval Postgraduate School 
Monterey, Ca. 93940 


10 PROGRAM ElEMEN T. PROJECT. TASK 
AREA & WORK UNIT NUMBERS 

61153N, RR0 14-0 5-0 1 
NA001478WR80035 


11. CONTROLLING OFFICE NAME AND ADDRESS 

Office of Naval Research 
Arlington, VA 22217 


12. report DATE 

November 1978 


13. NUMBER OF PAGES 


14. MONITORING AGENCY NAME & ADDRESS^// different from Controlling Office) 


15 SECURITY CLASS, (ot this report) 


15a. DECLASSIFICATION/ DOWNGRADING 
SCHEDULE 



16. DISTRIBUTION STATEMENT (ot this Report ) 

Approved for public release; distribution unlimited. 



17. DISTRIBUTION STATEMENT (ol the abstract entered In Block 20, It different from Report ) 



18. SUPPLEMENTARY NOTES 



9. KEY WORDS ( Continue on reverse aide If neceeemry and Identity by block number) 



Models , 

Discrete-Valued Time. Series 
Positive-Valued Time Series 
ARMA correlation 
ARMA processes 



Point processes 
Marginal distribution 
EARMA-type processes 
DARMA-type processes 
Autoregressive processes 



Moving average 
processes 



20. ABSTRACT (Continue on reverse aide If neceeeery and Identity by block number) 

Three models for positive-valued and discrete-valued stationary time series 
are discussed. All have the property that for a range of specified marginal 
distributions the time series have the same correlation structure as the 
usual linear, autoregressive-moving average (ARMA) model. The models differ 
in the range of marginal distributions which can be accommodated and in the 
simplicity and flexibility of each model. Specifically the EARMA-type pro- 
cesses can be extended from the exponential distribution to a rather narrow 

CnnMmiPd 



DD FORM 

UU 1 JAN 73 



1473 EDITION OF 1 NOV 65 IS OBSOLETE 
S/N 0 102-014- 6601 



SECURITY CLASSIFICATION OF THIS PAGE (When Dete Entered) 



^LUIJHITY CLASSIFICATION of THIS PAGE (When Data Entarad) 



20. Abstract continued 

range of continuous distributions; the DARMA-type processes can be defined 
usefully for any discrete marginal distribution and are simple and flexible. 
Finally the marginally controlled semiMarkov generated process can be defined 
for any continuous or discrete positive-valued distribution and is therefore 
very flexible. However the model suffers from some complexity and parametric 
obscurity. 



SECURITY CLASSIFICATION OF THIS PAGErWion Data Entarad) 



SIMPLE MODELS FOR POSITIVE- VALUED AND DISCRETE-VALUED TIME SERIES 



WITH ARMA CORRELATION STRUCTURE 



* ' * 

P. A. W. Lewis 



Department of Operations Research 
Naval Postgraduate School 
Monterey, CA 93940 



Paper presented at the Fifth International Symposium on Multivariate Analysis. 
To appear in the Proceedings published by North-Holland : Amsterdam. 



Simple Models for Positive-Valued and Discrete-Valued 
Time Series with ARMA Correlation Structure 

* 

P. A. W. Lewis 

Department of Operations Research 
Naval Postgraduate School 
Monterey, California 93940 

Abstract 

Three models for positive-valued and discrete-valued stationary time 
series are discussed. All have the property that for a range of specified 
marginal distributions the time series have the same correlation structure 
as the usual linear, autoregressive-moving average (ARMA) model. The 
models differ in the range of marginal distributions which can be accommo- 
dated and in the simplicity and flexibility of each model. Specifically 
the EARMA-type processes can be extended from the exponential distribution 
to a rather narrow range of continuous distributions; the DARMA-type pro- 
cesses can be defined usefully for any discrete marginal distribution 
and are simple and flexible. Finally the marginally controlled semi- 
Markov generated process can be defined for any continuous or discrete 
posit ive- valued distribution and is therefore very flexible. However 
the model suffers from some complexity and parametric obscurity. 



Research supported by National Science Foundation Grant AF476 and Office 
of Naval Research Grant NR-42-284 at the Naval Postgraduate School. 



1 . 



Introduction 



In much of the current work on the analysis of stationary time series 
there is an implicit assumption that the marginal distribution of the time 
series is normal. The assumption is implicit in that the marginal distri- 
bution is not considered to be of interest per se in the analysis, and 
also in that the statistical procedures which are used are very definitely 
based on normality assumptions. The stationary model on which much of 
this time series analysis is based is the mixed autoregressive moving 
average process, 



a A X + a-X. -+...+ a X 4 = b n 6, + b-C, + ... + b €. (1.1) 

0 i 1 l-l p i-p 0 i 1 i q l-q 



1 = 0 , + 1 , + 2 , 



sometimes called the ARMA(p,q) or Box-Jenkins process. The process (1.1) 
is specified quite generally as a linear combination of i.i.d. random 
variables {£^} of unspecified distribution, the linear, additive 
structure determining the correlation structure of the stationary sequence 
{X^} under well-known restrictions on the parameters. If one wants {X^} 
to be a time series with normally distributed marginal distribution, this 
can be accomplished by taking the £^ f s to be normally distributed. The 
model is then completely specified. 

There are, however, many situations in which observations occur 
serially and in which the marginal distribution is patently non-normal. 

For example, data on the number of occurrences of all known diseases in 
each week is kept by the National Center for Health Statistics. The data 
is not only discrete count data, but for many diseases it is mostly on 
the order of 0, 1, 2, 3, and very seldom above this. 
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It has been suggested that such non-normal data be handled by data 
transformations and this is probably appropriate if the data is only 
slightly non-normal. In other cases it seems reasonable to start afresh 
and develop models from scratch. In this paper we summarize attempts to 
do this for stationary time series which are known to be non-normal because 
of either positivity or discreteness or both. The essence of the models is 
that the marginal distribution is specified, as well as the correlation 
structure. More generally the models are required to be simple and 
flexible in the following senses: 

a) The models should be specified in terms of easily observed and 
measured quantifiers. When the models are stationary, these quan- 
tifiers would typically be 

i) the marginal distribution, and 
ii) second-order moments (correlations) . 

b) The models should be parametrically parsimonious and hopefully 
parametrically meaningful. 

c) The models should be easy to generate on computers, i.e., they should 
be structurally simple; in fact it might be preferable for the 
models to have linear structure. 

d) The models should be easy to fit to data, both informally and 
formally. 

The model (1.1) certainly has most of the above features, but it is 
not known in general how to specify the distribution of 6^ so as to 
produce a given, continuous marginal distribution for the X^’s. More- 
over, it is clearly not possible to do this at all if the X^ 1 s are 
discrete random variables. 
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The work described in this paper on non-normal time series is joint 
work with D. P. Gaver , P. A. Jacobs and A. J. Lawrance. Although the 
work has much broader connotation, it will be described in the context 
in which it arose, that of the description of stochastic point processes, 
or series of events occurring in time. One way in which these point 
processes can be described is as a sequence of intervals between events 
{X^}, which are of course positive-valued random variables. In the 
common case of a Poisson point process the X^'s have an exponential 
distribution. However, as in the case of epidemics, point processes 
are generally observed as counts of events in successive fixed intervals 
and these are non-negative discrete valued random variables. For the 
Poisson process these counts are independent and Poisson distributed and 
this serves as the null model in the analysis of count data from point 
processes . 

Three distinct models are discussed in the context of the analysis 
and description of point processes. All of them satisfy the requirements 
discussed above to some degree. The EARMA-type process described first 
has recently been extended to have a complete ARMA-type correlation 
structure, but the process cannot be extended to all continuous marginal 
distributions. Marginally controlled semi-Markov generated processes, on 
the other hand, give a complete analog to (1.1), but they do not have 
linear structure. They can also be extended to give processes with 
discrete marginal distributions. A simpler, random linear structure 
has been derived, however, which gives discrete processes with ARMA 
structure. These are DARMA-type processes and come closer than the 
other processes to fulfilling the requirements of simplicity and flex- 
ibility. 

Further details on the processes are to be found in the references. 
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2 . 



Interval Models: Sequences of continuous positive-valued random 



variables 

Univariate point processes in continuous time can be described equally 
well through the structure of the intervals between events {X^}, where the 
X_^ T s are continuous and positive-valued random variables, or the counting 
process {N(t)}, where N(t) gives the number of events in (0,t] and is 
discrete and non-negative. We discuss the modelling of the intervals 
{X^} first. Of course the applications of the models are much broader; 
the X_^ ? s might for instance be the magnitudes of successive shocks in 
a sequence of earthquakes or the successive response times of a computer 
to messages sent via a terminal. 

2.1. The first-order autoregressive exponential model (EAR(l)) 

In a Poisson process the intervals {X_^ } are independent and identically 
distributed (i.i.d.) random variables with exponential distribution 

F x (x) = 1 - e" Xx , X > 0; x > 0 . (2.1) 

Several attempts have been made to generalize the Poisson process by 
making the X^ dependent, but with exponential or conditionally 
exponential marginal distributions (Cox, 1955). The simplest and only 
really successful attempt in the sense of broad applicability (Gaver 
and Lewis, 1978) gives a process called the EAR(l) model, derived from 
the following consideration. 

A first-order autoregressive stochastic sequence is defined by the 
stochastic difference equation (a special case of (1.1)) 

x ± = px i-i + € i> i=0 ’ I 1 * ± 2 > •••; I p I < 1 > (2.2) 
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where the are assumed to be an i.i.d. stationary random sequence. 

If the are normally distributed, so are the X^. What must the 

distribution of the be in order for the sequence to be stationary 

with an exponential (A) distribution? The answer is surprisingly easy 
(Gaver and Lewis, 1978). 

Let 0 < p < 1, and let {E^} be an i.i.d. exponential(A) sequence. 

Now let be equal to zero with probability p and equal to 

with probability 1-p. Then we have 

( px i-i 

Xi = 

( px i-i + E i 
- px i-i + v i E i > 

where {V^} is an i.i.d. binary sequence and P{V^*0} = 1 - P{V^=l} = p. 

Moreover if we let Xq = E^, and define X^ as in (2.3), the resulting 

sequence is stationary for i=0,l, ... . 

The point process with the interval structure (4.3) is called the 

EAR(l) point process. It is a tractable model, and most of its important 

properties are given in Gaver and Lewis (1978) . In particular we have 
k 

that p(k) =• p . This model is in a sense degenerate because it con- 
tains runs of X^ in which values are exactly p times the previous 
value; it could, however-, be a reasonable model for point processes 
observed in computer systems (e.g., inter-arrival times of requests to 
a storage subsystem) in which the intervals have exponential marginal 
distributions but are dependent. Note that as defined the model can only 



probability p , 



probability (1-p) , 



(2.3) 



(1 
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provide sequences {X^} with positive serial correlations. We can, 
however, define the process to include negative correlations (Gaver and 
Lewis, 1978); there is also a way to obviate the degeneracy (Lawrance, 
1978). 

Simple generalizations of this first-order, autoregressive, Markovian 
exponential process are the following. 

2.2. The moving average exponential model (EMA(q)) . 

We define another stationary sequence {X^}, using the {E^} sequence 
above, according to 



X 



0 




(2.5) 



X ± = 6E. + U ± E 1-;l , i=l, . . . ; 0 < 6 < 1 , 



( 2 . 6 ) 



where {U^} is an i.i.d. binary sequence in which U_^ = 1 with prob- 
ability (1-0). This is a first order exponential moving average 
process (EMA(l)) (Lawrance and Lewis, 1977) which is one-dependent; 
in particular 



p(l) = 6(1-6) (2.7) 

p(k) = 0 , k-2, 3, ... . (2.8) 

Properties of the EMA(l) process are given by Lawrance and Lewis (1977). 

It is easy to see that we can make E_^ ^ in (2.6) a random linear 
combination of E^_^ and E^ ^ to § et an EMA(2) process, and can con- 
tinue the process back q steps to obtain an EMA(q) process. The 
general EMA(q) model takes the form 



6 



8 E 

q i 



w.p . b 



q+1 ’ 



8 E. + 8 t E - 

q l q-1 i-1 



X. = 

l 



8 E + 8 ..E. - + ... + 8, E. 

q i q-1 1-1 1 l— q+1 

8 E + 8 1 E + ... + 8-tE - + E 
q i q-1 1-1 1 i-q+1 i-q 



w.p. 



w.p , 
w.p . 



\ • 



b 2 ’ 



b l ’ 



(2.9) 



for 0 < B 2 , 6 £1; i-0, +1, +2, ... and 



b i - <[ (l-e ) ... (1-B i )3 i _ 1 
(l-B ) ... (l-3 ± ) 



i = q+1 , 
q > J. > 2 , 
i = 1 . 



( 2 . 10 ) 



Note that the can be obtained uniquely from the b^'s; there are 

q+1 b^'s but only q B's, since the sum of the b_/s is equal to one. 

This model is clearly only q dependent; in particular the correla- 
tions for the EMA(q) process are 



(q) 

p(k) = corr (X ± , X ± _ k ) = 



q-k+1 

I 

v=l 



b b , 
v v+k 



0 



1 < k < q , 

q+1 < k < 00 . 



(2.11) 



Thus the serial correlations are just lagged products of the b^ sequence 
and the formula (2.11) is completely analogous to the formula for the 
serial correlations of the standard MA(q) process; see Box and Jenkins 
(1970, p. 68). It can be seem from (2.11) that all the correlations are 
nonnegative and it may be further shown that they are bounded above by 
1/4. 
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2.3. The EARMA(1,1) model . 



By making E^ ^ in (2.9) autoregressive over the previous E_^ f s, we obtain 
a mixed qth order moving-average, first order autoregressive process which 
we denote by EARMA(l,q) . Consider explicitly the case q=l. The first 
order moving-average and first order autoregressive process EARMA(1,1) 
is given by 



with 



x i - 3E i + u i A i-i > 



A i-1 pA i-2 + V i E i-l ’ 



( 2 . 12 ) 



(2.13) 



for i=l» 2, 3, ... and = E_^ with and as defined above. 

This sequence of random variables is not Markovian. 

The second-order correlation structure of the process is given by 

p(k) = p 1 "' 1 c(3,p) , (2.14) 

where 

c( 3 , P ) = eu-s) + P (i-e)(i- 2 e) . (2.15) 

The point process whose intervals have the EAPMA(1,1) structure is dis- 
cussed in detail in Jacobs and Lewis (1977). In particular, for 3=1 
it is a Poisson process. The process is very simple to generate on a 
computer and is very useful for modelling dependent sequences in queueing 
systems (Jacobs, 1978; Lewis and Shedler, 1978). 

2.4. The pth-order autoregressive model EAR(p) . 

Quite recently ways have been found to obtain exponential sequences {X^} 
which have autoregressive structure of order p, and to combine these 
with the moving average process to get a mixed autoregressive-moving 
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average process EARMA(p,q); see Lewis and Lawrance (1978). Another method 
of defining pth-order autoregressive exponential sequences, which is 
closely related to the DARMA(p,q) process discussed later, and which we 
have only just begun to explore, is described here. 

This pth-order exponential autoregressive model can be written as 



X. 

1 



a S. X i-s. + € i,S 

i i ’ i 



(2.16) 



where the S^’s are i.i.d. discrete random variables taking values 

1, 2, .... p, and €. is defined to be 0 w.p. a., and E. w.p. 

i,S J i 

a. if = j . If one assume stationarity and that X^_^, ^i-2’ are 

marginally exponent ial (X) , then is a random mixture of and 

X^_^, ..., X^ and is exponential (X) . The correlation equations from 
this process are variants of the familiar Yule-Walker equations. The 
model is more tractable than the pth-order autoregressive process given 
in Lewis and Lawrance (1978) and is probably simpler to extend to other 
distributions than the exponential. 

A drawback of these EARMA-type processes is that the serial cor- 
relations are all positive, although the scheme given in Gaver and Lewis 
(1978) for a negatively correlated EARMA1 process can probably be 
extended to the complete EARMA(p,q) process. 



2.5. The semi-Markov generated point process with fixed marginal dis- 
tribution . 

The question arises as to whether there are interval processes {x_^} with 
exponential marginal distributions and, for example, ARMA(1,1) second- 
order correlation structure and which cover a broader range of correla- 
tion than the EARMA(1,1) process (though perhaps at a cost of more 
complicated structure) . 
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We discuss briefly one such process. It is a special case of the 
semi-Markov generated point process introduced by Cox (1962) and extended 
by Haskell and Lewis (1978). We first describe the two-state semi-Markov 
generated model. In this model there are two types of intervals with 
distributions F^(x) and F 2 (x), sampled in accordance with a two-state 
Markov chain for which the one-step transition matrix is 




and the stationary vector is 




(2.18) 



(2.19) 



When we form the point process we assume that no information is available 
about the type of interval, i.e., that in the actual bivariate point pro- 
cess of transitions we suppress knowledge of the type of transition. Then 
the distribution of an interval between transitions (events) in the 

stationary point process is 

F x (x) = TT 1 F 1 (x) + tt 2 F 2 (x) (2.30) 

and the correlation between X. and X. is 

i i+k 

p(k) = M k , k=l , 2, ... , (2.21) 

where M is a positive constant and 3 = + 0^ - 1 = (1 - a^) . 

Thus the correlation structure is that of an ARMA(1,1) process. For a 
derivation of this result see Cox and Lewis (1966, Ch. 7, 194-196). 
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Lewis and Shedler (1973) used this process to model the page exception 
process in a multiprogrammed computer system. The problem is to deal 
with the mixture distribution (2.20) for the marginal distribution of 
intervals; this seams to limit the utility of the model. However, there 
is a way around it which produces a marginally controlled semi-Markov 
generated process. 

To obtain an exponential marginal distribution, consider the 
following device (Jacobs and Lewis, 1977). Fix x^, where 0 < x^ < °°, 
and let 



F x (x) 




% 



i - 





i 



0 1 x 1 x o 



x > x. 



X < x 0 , 



F 0 (x) = ( jX -Xu 



e du 



-Ax, 



x >x 0 ; 



( 2 . 22 ) 



then F(x), the marginal distribution of an interval, is exponential(X) 
if we set = 1 - exp(-Xx^). There is one degree of freedom left in 

the matrix _P; in addition to X, we have free parameters tt^ (or x^) 
and although the range of is restricted. What then is the range 

of 3, and can it be negative? 

Straightforward manipulation shows that 



3 = 




(2.23) 
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which lies in absolute value between zero and one but can be negative; 
therefore the serial correlations can be negative. Thus the model appears 
to be broader than the EARMA(1, 1) model. The question of comparing the 
two models when 3 is positive has not yet been explored; it requires 
higher order interval correlations, as discussed by Brillinger (1972). 

2.6. Generalizations 

The marginally controlled semi-Markov generated sequence {X_^} discussed 
above can be extended in such a way that X_. will have any distribution, 
say F(x). Thus we let 



F 1 (x) 

F 2 (x) 



then the marginal distribution of an interval is equal to F(x), from 
(2.30), if we set 7T = F(x^). Note that the model is very non-linear 
and the correlation structure is a complicated function of the functional 
form of F(x) . 

The much simpler EARMA structure can be extended to some extent. 
Random variables for which the equation (2.2) has a proper solution are 
called self-decomposable random variables on random variables of type L. 
This class includes random variables with Gamma, Cauchy, Pareto, double 
exponential and perhaps many other distributions. For these random 



F(x) 

F <V 



0 £ x < x Q , 



x > x 0 ; 



x 5*o ' 



(2.24) 



F(x) - F(x 0 ) 

1 - F <V 



x > x 0 ; 
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variables, a pth-order-autoregressive process can be defined as at (2.16). 
The unique feature of the exponential process is that the which makes 

X_^ exponential (X) in (2.2) is again an exponential (X) random variable, 
albeit mixed with an atom at zero. This property, shared with the double 
exponential and normal random variables, is what makes it simple to define 
a moving-average type process, as at (2.9). 

3 . Count Models: Sequences of discrete-valued random variables . 

As remarked earlier, most data on point processes is recorded as 
numbers of events in successive fixed-length intervals. Despite this 
fact, most point process models assume that exact times of events are 
known and it is not simple to derive from these models the statistics 
of the counts in fixed intervals. Thus in this area in particular 
flexible models for discrete-valued random variables are needed. 

Another application might be to modelling of air pollution data in 
which concentrations of various chemicals in the air is indicated on a 
scale of zero to ten. In general this situation requires multivariate 
time series, but space prohibits discussion of multivariate versions of 
the DARMA-type processes discussed in this section. 

3.1. The first-order autoregressive discrete model (DAR(l)) . 

Again we denote the sequence of discrete-valued random variables by 
{X^} . If the X i are counts in a Poisson process then the X^'s are i.i.d. 
Poisson-distributed random variables. Once dependence is observed in 
data it is useful to assume, as a first cut, that the dependence is 
Markovian and use a Markov chain model in which the distribution of 
X_^_^ depends only on the value of X_^ and is specified by the transi- 
tion matrix P with elements 
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P(k,j) = p{x i+1 = j |x ± = k} , 



(3.1) 



with j and k taking values in the space E, a discrete subset of the 
real line. Under suitable conditions there is a stationary distribution 
TT for (X^} given by the equation 

TT = TT P . (3.2) 



The Markov chain model (3.2) is by virtue of its place in the stat- 
istician’s toolbox the discrete counterpart of the AR1 process. However 
the AR1 process has one dependency parameter p, plus any parameters which 
specify the distribution of the (E^ f s. The Markov chain on the other hand 
can have an infinite number of parameters and in many cases TT cannot be 
obtained explicitly from (3.2). This is awkward for statistical analysis. 

A solution is given by constructing the DAR(l) model (discrete autoregressive 
model of under one) which is an analog of the EAR(l) model, as follows. 

Let be an i.i.d. sequence of random variables taking values in 

the space E, and let be an i.i.d. binomial sequence for which 

p{v ± = 1> = p. Then 



X. 

l 




X 



i-1 



+ (1 - V i )Y ± 



i= 0 , +1 , +2 , 



0 < p < 1 . (3.3) 



\ X i_i W ‘P* p » 

^ Y^ w.p. (1-p) 



(3.4) 



If Xq has distribution tt, then so does X^ since it is a. mixture of two 
random variables, Xq and Y^ , with distribution tt. Consequently all 
the X^, i=l, 2, ... have marginal distribution TT. 
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Note that {X^ is a Markov chain with transition probabilities 

( (1-p) ir(j) k i j , 

p(k,j) = p{x i+1 = j |x ± = k} = 1 (3.: 

/ P + (1-p) Tr(j) k = j ; 

in fact it is a Markov chain in which the correlation structure is spe- 
cified by one parameter p, and with specified marginal (stationary) 
distribution TT. Thus TT may be a Poisson distribution and then the 
DARI model is a 2-parameter (A,p) Markov chain. The analogy with the 
AR(1) model is clear 

As with the EAR(l) model the serial correlations are p(k) = p _> 0. 
Extensions to negatively correlated sequences are given in Jacobs and 
Lewis (1978) . 

3.2. The pth-order autoregressive discrete model (DAR(p)) . 

First order Markov dependence is a special kind of dependence which is 
attractive because of analytical tractability considerations, but it is 
not necessarily met with in practice. One immediate consequence of the 
Markovian property is that runs of distinct values, say X_^ = j , have a 
length which is geometrically distributed (Jacobs and Lewis, 1978a) and 
this is easily checked in data. If the data fails to have this property, 
what other types of dependency can be utilized? 

A first direction might be to go to higher order (say pth-order) 
autoregression, which is an explicit pth-order Markov structure, and 
the DAR(l) model can be extended in this direction. Thus in addition 
to the assumptions at (3.3) let A^ be an i.i.d. sequence of random 
variables taking values in {l, 2, ..., p}, with P{A_^ = j ) - . Then 

the DAR(p) process is defined as 
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X i = V i X i-A + (1 “ V i )Y i > i=0 > ± X > ± 2 > ••• 



(3.6) 



so that is (exclusively) either one of the previous p values X^_^, 

X -L_p> or the error term Y_^. Properties of this model are developed 
extensively in Jacobs and Lewis (1978c). When = 1, and all other <X 1 s 
are zero it is the DAR(l) model. 

Yule-Walker equations for the correlations in the stationary DAR(p) 
process are given in Jacobs and Lewis (1973c) as well as stationarity 
conditions. In particular for p=2 we have the limiting result 



{l-p(l)}TT(k)TT(j) 



V(k,j) = lim P(X i+ 1 =k, X i+ 2 =j} = 



..2 



P(1)tt(j) + {l-p(l) }tt(j ) 



k ¥ j , 



k = j > 



(3.7) 



where p(l) = corr(X^, X_^_^) t ^ ie stationary process. Thus, if we let 

Xq and X ^ have the joint distribution V(k,j) f a stationary, second- 
order autoregressive process with any marginal distribution can be 
generated. A scheme for obtaining sequences which are possibly negatively 
correlated is given in Jacobs and Lewis (1978c) . 

3.3. The q-th order moving average discrete model (DMA(q)) . 

The other alternative to Markovian dependence (of any order) which is 
usually considered in time series analysis is the finite-length dependence 
produced by the moving-average part of the ARMA(p,q) process (1.1). This 
type of behavior is easily produced for discrete random variables by a 
random index model of the type 



X 



i 



= Y 



i-S 



> 

i 



(3.8) 
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Thus we may 



where 

write 



S . 

l 



are i.i.d. random variables with P{S. < k} = b, . 

1 — k 



X. 

l 



Y 



i-k 



w. p . 



\ - b k-i 



k=0, . . . , q; b_ x = 0 . (3.9) 



The autoregressive process DAR(p) is also a random index model, but the 
random indices are not independent. The correlation structure of this 
DMA(q) process is easily found to be 



(q) q- k 

p(k) = corr (X ± X i _ k ) = £ 1 1 k 1 q . 



(3.10) 



= 0 



k > q . 



This is the exact analog of (2.11) for the EMA(q) process and the cor- 
responding formula for the MA(q) process. Note that the DMA(q) process 
is not Markovian. Runs properties of the process are given in Jacobs 
and Lewis (1978a); the runs are not geometrically distributed. 



3.4. Mixed autoregressive-moving average discrete models . 

As in the case of the ARMA(p,q) model (1.1), it is useful to have both 
autoregressive, Markovian dependence and moving average dependence com- 
bined into one model. In Jacobs and Lewis (1978a) this was done by 
replacing the Y^_^ term in (3.8) by a discrete autoregression (3.3) 

over Y , Y Clearly this can be extended by replacing 

l-q i-q-1 

Y^ ^ by a p-th order autoregression (3.6) over Y^_^, ••• to 

obtain a DARMA(p,q) model which is the analog of the EARMA(p,q) model 
of Lawrance and Lewis (1978). This is not a complete analog of the 
ARMA(p,q) model in that there is no cross-over of the autoregression and 
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the moving average, but it is in fact possible to do this to obtain a 
model called NDARMA(p,q) as follows: 

Let 



x i ■ v i X i- Al + (1 - V Y i-s t 



i-0, +1 , +2 , ... , 



(3.11) 



where the are i.i.d. random variables taking values in {l, 2, p} 

with P{A^=j} = otj ; the are i.i.d. random variables taking values in 

(0, . .., q} with P{S^ _< k} = F(k) and the V^ T s are i.i.d. Bernoulli 
random variables with P{V^=l} = p. 

The model works because a mixture of dependent random variables, all 

with marginal distribution tt, has distribution TT; thus if ..., 

X. have marginal distribution TT then so will X. since it is a mixture 
i-p & ’ i 

of the dependent random variables X, - , .... X, and Y, . .., Y, 

^ i-1 ’ i-p i’ 5 i-q 

Note that when p=0 we have the DMA(q) process; if in addition F(0) = 1 

the sequence is i.i.d. since X^ = Y^. When 1 > p 4 0, F(0) = 1 we have 

the DAR(p) process. Thus the parameters are such that interesting special 
cases fall out easily. Moreover the p parameter measures the degree of 
mixture of Markovian and moving average dependence, and the distributions 
of the A^'s and S^ T s give a picture of where the dependence is lagged 
over previous X^ or Y_^ values. 

The model (3.11) has not yet been fully explored. At first sight it 
seems preferable to the .DARMA(p ,q) model, possibly because of the compact- 
ness of (3.11) and its close analogy to ARMA(p,q) models. The DARMA(p,q) 
and NDARMA(p,q) models are, however, distinct and in fact preliminary 
investigation of the (1.1) case shows that the DARMA(1,1) model (Jacobs 
and Lewis, 1978a) has a broader correlation structure than does the 
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NDARMA(1,1). On the other hand the autoregression is not explicit in 
the DARMA(1,1) model. Both models, therefore, will probably be useful 
in modelling discrete data such as occur in sampled point processes. 

3.5. The marginally controlled semi-Markov generated process . 

In the structure of the 2-state marginally controlled semi-Markov generated 
process detailed at (2.24) no assumption was made about continuity of F(x). 
Thus F(x) could be discrete, giving a sequence (x^} with known discrete 
marginal distribution F(x) and ARMA(1,1) correlation structure. By 
going to an n-state semi-Markov model, a process with ARMA(p,q) correla- 
tion structure can be generated (Haskell and Lewis, 1978) with n a 
function of p and q, and the procedure to obtain a given marginal 
distribution is just an extension of (2.24). Thus we have, in terms 
of the quantification of the process by marginal distribution and 
correlation structure, a direct competitor to the DARMA-type processes. 

Comparison of the two types of discrete processes is interesting and 
points up the simplicity of the DARMA-type processes. In particular the 
correlation structure of the DARMA(p,q) process is explicit in form if 
not in detail and the process is a simple, random linear combination of 
random variables generated from an i.i.d. sequence Y^. This is clearly 
not true for the marginally controlled semi-Markov generated process; 
the recognition that its correlation structure is ARMA-type is accidental 
and not intuitive. Deeper comparison of these processes in terms, say, 
of the range of correlation the model will encompass will be instructive. 
Here again the DARMA-type processes have an advantage; their correlation 
structure is independent of the marginal distribution 7T. 
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4 . Summary and Conclusions 



We have outlined in this paper three models for discrete-valued and 
positive-valued time series, all of which to some degree satisfy the 
criteria of flexibility or simplicity or both set forth in the intro- 
duction. Perhaps the main point about the models is that they are 
designed to accomodate situations in which the marginal distributions 
in the stationary processes are given and are non-normal. 

Properties of these models such as mixing and asymptotic results, 
higher-order moments, distributions of runs for the discrete models and 
sums of random variables and point spectra are considered in the 
references. 

There are many other properties of the processes which are still 
to be explored. Statistical estimation, except in an ad hoc manner and 
for* the Markovian cases, is difficult and has yet to be examined. 

Extensions to multivariate cases is of great interest for real applica- 
tions and has been done to some degree in the context of queues with 
correlated service and arrival times (Jacobs, 1978, and Lewis and Shedler, 
1978). The DARMA-type processes, in particular, can be easily extended 
to coupled equations in the same way as linear processes are extended in 
econometric models. They might therefore find use in modelling multi- 
variate situations such as the number of cars passing different points 
in a road evaluated in successive fixed time intervals. 

Finally an important problem is to extend the models so as to include 
inhomogeneity, particularly of the seasonal type, and the effects of con- 
comittant or auxilliary variables. Several schemes are under consideration 
for these extensions of the models. 
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